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In this work we apply techniques and modus operandi typical of Statistical Mechanics to a large 
dataset about key social quantifiers and compare the resulting behaviours of five European nations, 
namely France, Germany, Italy, Spain and Switzerland. The social quantifiers considered are i. 
the evolution of the number of autochthonous marriages (i.e. between two natives) within a given 
territorial district and ii. the evolution of the number of mixed marriages (i.e. between a native and 
an immigrant) within a given territorial district. Our investigations are twofold. From a theoret¬ 
ical perspective, we develop novel techniques, complementary to classical methods (e.g. historical 
series and logistic regression), in order to detect possible collective features underlying the empir¬ 
ical behaviours; from an experimental perspective, we evidence a clear outline for the evolution 
of the social quantifiers considered. The comparison between experimental results and theoretical 
predictions is excellent and allows speculating that France, Italy and Spain display a certain degree 
of internal heterogeneity, that is not found in Germany and Switzerland; such heterogeneity, quite 
mild in France and in Spain, is not negligible in Italy and highlights quantitative differences in the 
customs of Northern and Southern regions. These findings may suggest the persistence of two cul¬ 
turally distinct communities, long-term lasting heritages of different and well-established cultures. 
We also find qualitative differences in the evolution of autochthonous and mixed marriages: for the 
former imitation in decisional mechanisms seems to play a key role (and this results in a square root 
relation between the number of autochthonous marriages versus the percentage of possible couples 
inside that country), while for the latter the emerging behaviour can be recovered (in most cases) 
with elementary models with no interactions, suggesting weak imitation patterns between natives 
and migrants (and this translates in a linear growth for the number of mixed marriages versus the 
percentage of possible mixed couples in the country). However, the case of mixed marriages dis¬ 
plays a more complex phenomenology, where further details (e.g., the provenance and the status of 
migrants, linguistic barriers, etc.) should also be accounted for. 


I. INTRODUCTION 

In the past decade huge datasets have been captured, stored and shared to the purpose of predictive analytics. 
This was allowed and prompted by a number of reasons, ranging from novel techniques for massive data acquisition 
(especially in Biotechnological and Medical Research [TJ[2]) to the genesis of suitable repository (clouds) merging data 
collected from various sources/laboratories (especially in Sociological and Economical Research [3, Tj). As a matter of 
fact, researchers are nowadays provided with extensive data whose hidden structures may escape classical inferential 
methods, hence driving the quest for proper techniques to extrapolate these inner contents. 

In this context, a novel generation of Machine Learning (e.g., Deep Learning) has been devised for data mining 
[5] and tools from Statistical Mechanics have been developed to extract predictive information from the available 
data sets. Indeed, being firmly grounded on the law of large numbers and on the minimum energy and maximum 
entropy principles [6], Statistical Mechanics constitutes a solid approach for many applied fields, far beyond its original 
scope. For instance, in the last decade, it has been robustly exploited in economic complexity (see e.g., [UEHSI), 
social complexity (see e.g., [TOHIU] ). and even in medicine and pharmaceutical (see e.g., [TiUTB] ). just to cite a few. 
In particular, as for the investigation of social complexity, one can rely on a bulk of stochastic techniques (see e.g. 
[TUI ITT, T7, T81) meant to model the underlying interacting network of the system (whose nodes are typically single 
decision makers and links mirror the existence of pair-wise interactions or reciprocal knowledge [H1I13) and to figure 
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out its emergent features. Although this perspective always implies a certain degree of simplification, its reward lies 
in its crucial ability to unveil collective behaviors that markedly shine over the fluctuating details. 

In this paper we adopt stochastic techniques stemmed from Statistical Mechanics to analyze extensive data regarding 
a key aspect of Social Complexity, that is the evolution of marriages, either between two natives (i.e., “local marriages”) 
or between a native and an immigrant (i.e., “mixed marriages”), within several European countries. 

Local and mixed marriages constitute standard social quantifiers to assess, e.g., the role of conjunctural phenomena 
on the family and the degree of migrant integration in a given region MM- The comparison between the outcomes 
pertaining to different countries will allow to highlight either robust (i.e. shared across the nations) or local (i.e. 
national) features. 

In our Statistical Mechanics analysis we look at the whole population as a set of decision makers and our order 
parameter (namely the global observable capturing the behaviour of the system) is the density M of marriages, either 
local or mixed. In order to estimate theoretically M we refer to two prototypical models. In the former model 
we postulate that each decision maker acts independently of each other, being influenced only by “external fields” 
(e.g., stemming from cultural transitions or conveyed by the media). This model constitutes an adaptation of the 
McFadden Discrete Choice theory [23] and predicts that M scales linearly with the fraction X of potential couples 
within the system, i.e., M tx X. In the other model we postulate that social interactions play a role in biasing the 
behaviour of decision makers and that such interactions are imitative. This model essentially reproduces the Brock- 
Durlauf Discrete Choice [24] and predicts M oc y/X. We stress that the existence of imitative social interactions 
within an homogeneous population is by now well evidenced and accepted; quoting Brock and Durlauf 23 125] - 
the utility an individual receives from a given action depends directly on the choice of others in that individual’s 
reference group. Dealing with mixed marriages, however, this paradigma may fail as migrants can be subject to 
cultural differences, prejudices and discrimination, ultimately segregating them and constraining their searches within 
a segmented marriage market Bung. If this is the case, following the statistical mechanical modeling, we would 
expect qualitative different behaviours for the evolution of local and mixed marriages. Indeed, as we are going to show, 
for local marriages the behaviour expected from interacting systems is nicely recovered, while for mixed marriages, in 
most cases, the behaviour expected from non-interacting systems prevails. 

More precisely, in this work we will consider empirical data (available from INSEE for France, DESTATIS for 
Germany, ISTAT for Italy, INE for Spain, and FSO for Switzerland) and, after an extensive data analysis and model 
calibration, we get to the following results: 


• Social interactions seem to play a major and robust role in driving local marriages since, for all the countries 
considered, the square-root scaling is nicely recovered; this is not the case for mixed marriages, whose evolution 
scales linearly with the number of available couples for most of the analyzed countries, suggesting a major role 
played by the independent model. 

• Focusing on local marriages, we unveil that Germany and Switzerland display internal homogeneity, that is, 
by repeating the analysis at a regional (rather than national) level of resolution, we find that the scaling law 
for local marriages is quantitatively robust over all the regions making up each country. On the other hand, 
Latin countries (i.e., France, Italy, and Spain) display internal heterogeneity, namely, the scaling law for local 
marriages is only qualitatively robust over all the regions making up each country. This effect is rather mild for 
France and Spain, but well distinguishable in Italy. In particular, in Italy one can detect two clusters of regions 
wherein homogeneity is recovered and, remarkably, these clusters turn out to correspond sharply to Northern 
and Southern regions. 


• Focusing on mixed marriages, we evidence that France and Germany essentially follow the Me Fadden pre¬ 
dictions, while Spain and Switzerland seem to obey the Brock-Daurlauf scenario, at least for small values of 
migrant’s percentage inside the country and eventually loosing the imitational mechanism for higher values of 
migrant’s percentage thus collapsing on the Me Fadden case too. For Italy extrapolation of a clear trend is 
rather hard due to a very noisy behaviour, again accountable to internal heterogeneities. However, in any case, 
the emerging phenomenology is more complex than the one found for local marriages and we claim that further 
elements (e.g., the provenance and the status of migrants, linguistic barriers, etc.) play a non marginal role. 


These results are discussed in the following sections: in Sec. [Il] we present the main results, corroborated by extensive 
data analysis; in Sec. Ill we show the techniques exploited, distinguishing between theoretical methodologies and data- 
analysis protocols; in Sec.[TV]we summarise our results and discuss possible outlooks. 
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II. RESULTS AND DISCUSSION 


Our investigations aim to tackle and describe i. the evolution of the density of autochthonous marriages in a given 
country as a function of the density of possible couples present in its population and ii. the evolution of the density of 
mixed marriages versus the density of possible mixed couples (one native and one migrant) present in its population. 
The comparison between the two outcomes can shed light on similarities and differences regarding interactions between 
natives and among natives and migrants. Further, analysis are performed on several countries and the comparison 
between the related results can highlight either robust or country-dependent features. 

Here, before presenting the results, we briefly describe the underlying theoretical scaffold and data-analysis methods, 
while we refer to the section Methods for more details and to Ref. [13 El EU for an extended treatment. 

The analysis is carried out one country per time. For any given country, we first need to fix the degree of resolution 
at a certain territorial district, in such a way that (reasonably assuming that the number of marriages in a given 
district depends mainly on the population within the district itself) different districts can be considered as effectively 
independent realizations of the same system. We decide to fix the resolution at the provincial level, as its extent 
is typically broad enough to justify the hypothesis of independence m i29j and to wipe out local phenomena as 
alienation and segregation [33], yet the number of provinces is still large enough to get a good pool for the statistical 
analysis. 

We call Np the total amount of provinces for the country considered ( Np = 96 for France, Np = 476 for Germany, 
Np = 110 for Italy, Np = 53 for Spain, and Np = 139 for Switzerland, see also : 31]), each labeled with an index i, 
namely i = 1, ...,Np. For each province we have time series of data concerning the fraction of males and of females, 
and the fraction of natives and migrants over the whole population and over a proper time window, whose extent 
depends on the country considered. Data are collected yearly and we denote with T the number of years sampled for 
the country considered (T = 5 [2006-2010] for France, T = 6 [2007-2012] for Germany, T = 6 [2005-2010] for Italy, 
T = 8 [1996,1998-2004] for Spain [52], and T = 3 [2008-2010] for Switzerland), each labeled with an index y, namely 


y = l,...,T. 

Thus, we call y the fraction of possible couples among natives (i.e., the fraction of native males times the fraction 
of native females) [53] in the province i and at time y. Similarly, we call Qi y the fraction of possible couples involving 
one native and one foreign-born (i.e., the fraction of natives times the fraction of foreign-borns) in the province i 
and at time y. It is worth stressing that, as empirically evidenced in Sec. Ill the ratio between males and females is 
constant with respect to the overall extent of the population, and this holds for both native and migrants communities 
in such a way that, the fraction of possible mixed, heterosexual couples is just proportional to VL iy . 

The available datasets also include the time series {LMj i2/ } and for, respectively, the fraction of local 

marriages and of mixed marriages with respect to the overall number of marriages occurred in the province i at time 
y. again, we refer to Sec. m for a detailed definition. 

In the following treatment, for simplicity, we will generically denote with X i y and with M iy the number of possible 
couples and of effective marriages in the province i in the year y; according to the phenomenon we mean to model, 
Xi y and will be replaced by Ti y and LM i y (dealing with local marriages) or by Cli y and MM^ y (dealing with 
mixed marriages). 

Each agent making up the system is looked at as a decision maker, the decision being whether to contract or not 
contract a marriage with another agent. There are two prototypical models for such a system of decision makers, 
one is close in spirit to the McFadden Discrete Choice Theory [23, the other is close to the Brock-Daurlauf theory 
[24 , 26]. More precisely, in the former model one assumes that each agent decides independently of the other agents 
(i.e. one-body model), while in the latter one assumes that social interactions are present and each agent is influenced 
by the choice of the other agents (i.e. an imitative two-body model). These two opposite scenarios constitute the 
extremal cases, such that the related predictions provide bounds for the evolution of the average number of marriages 
within a country. The predictions for the two models are briefly recalled hereafter (see Sec. Ill for more details): 


• Model with no interactions [McFadden scenario] 

The amount of marriages M, y , within the province i, scales linearly with the fraction of possible couples X'i tV , 
namely M i y oc X i y mm- Otherwise stated, as the time goes by, X i y changes and M i y changes too, but in 
a correlated way, that is according to a linear law. Under the assumption of homogeneity, the same holds for 
the overall number of marriages M y at the country level 


My 


X, 


yi 


(i) 


where X y is the fraction of possible couples in the whole country at time y, while M y is the fraction of effective 
marriages celebrated at time y over the whole country. 


• Model with social interactions [Brock and Durlauf scenario] 

The amount of marriages M ly within the province i scales with the square-root of X. hy , namely Mi y oc ^JX^y 
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mug. Otherwise stated, as the time goes by, X iy changes and M iy changes in a correlated way, this time 
according to a square-root law. Under the assumption of homogeneity, the same holds for marriages M y at the 
country level 


My ~ y/Xy. 


( 2 ) 


We now proceed with the data analysis and the comparison with the previous theoretical predictions (i.e., 
and [2]) , treating separately the case of local and mixed marriages. 


Eqs.[l] 
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FIG. 1: Local marriages LM versus the fraction of potential available couples T for France (panel a), Germany (panel 6), Italy 
(panel c and d), Spain (panel e), and Switzerland (panel /). The data points (circles) presented here are obtained from the raw 
data LM ijH and y representing, respectively, the number of local marriages in the province i at time y divided by the total 
number of marriages. Details on the manipulation of raw data are provided in Sec. |III| The error circles should be understood 
with respect to the legend reported in each graph. I 11 order to establish a direct comparison among the states, all the plots were 
realized dividing the interval of data in 30 bins. For each data set (circles) the best fit(s) according to Eq. [5] is also provided. 
As it is clear from the plots, and numerically confirmed by the tables presented in Fig. [2j Italy is best fitted by two square 
root curves that, remarkably, naturally split the country exactly into Northern and Southern regions as reported by Eurostat. 
The two-color map shown for Italy depicts the definitions of Northern and Southern Italy as given by the European Union 
(Eurostat data) and that perfectly matches results from our data analysis. The black circles, whose sizes mirror the related 
amount of available data, represent examples of cities whose marriages along the years 2000-2010 have been studied. 
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No. of bins 

- ES 

IlFR 

■IDE 

□ CH 

No. of bins 

HIT 

North 

South 

5 

0.897 

0.946 

0.973 

0.904 

5 

0.620 

0.895 

0.747 

10 

0.937 

0.964 

0.971 

0.968 

10 

0.742 

0.975 

0.925 

15 

0.933 

0.970 

0.933 

0.977 

15 

0.715 

0.991 

0.973 

20 

0.944 

0.969 

0.946 

0.972 

20 

0.773 

0.992 

0.970 

25 

0.940 

0.972 

0.927 

0.964 

25 

0.721 

0.991 

0.980 

30 

0.947 

0.972 

0.904 

0.960 

30 

0.788 

0.993 

0.986 


FIG. 2: These two tables show values of R 2 coupled to the best fits of data sets LMj ia versus Ffor all the countries considered 
and for several choices of binning; the case with 30 bins is the one shown in Fig. |T[ As for Italy, whatever the level of resolution 
we fix (i.e. whatever the amount of bins we use), the best fit considering the country as a whole is always significantly worst 
than the one obtained considering Northern and Southern regions separately. 


A. Analysis of local marriages 

In this section we focus on the data analysis of local marriages (i.e., marriages between two natives) within France, 
Germany, Italy, Spain and Switzerland starting from the historical series collected. 

As shown in Figure [l] the behavior predicted by the model with social interactions (eq. [2]) successfully matches data 
for all the countries considered but Italy. In fact, if analyzed as a single, homogeneous set, data for Italy are too 
noisy to detect any clear trend (see also Figure [2]), yet, by treating data for Northern Italy and for Southern Italy 
separately (as if they were two different countries), the expected square-root behavior clearly emerges. 

It is worth noticing that, to obtain this result, we split Italy in such a way that the goodness of the fit (measured in 
terms of relative R 2 ) is maximal, and this division turns out to coincide with the definition of Northern and Southern 
Italy as reported by Eurostat. 

In order to deepen this point and check whether “hidden heterogeneities” also occur in the other countries, we 
perform further analysis, where an intermediate level of resolution is introduced. More precisely, each province is now 
associated to a region, which constitutes a coarser territorial district. We call Nr the total amount of regions for the 
country considered (Nr = 22 for France, Nr = 16 for Germany, Nr = 20 for Italy, Nr = 18 for Spain, Nr = 26 for 
Switzerland), each labeled with an index a, and repeat the analysis region by region. As a results, the data series are 
reshuffled as X auV and M a . tV , where a* denotes the province i in the region a; in general, each region a include a 
different number Np a of provinces, in such a way that on = 1,..., Np a . 

Our aim is now to detect the possible existence of clusters of regions displaying different behaviours by comparing 
regional outcomes with the national average. The analysis performed is summarized in Fig. [3] for France, Fig. [4] for 
Germany, Fig. [5] for Italy, Fig. [6] for Spain, and Fig. [7] for Switzerland. First, we look at how data pertaining to 
different regions are scattered around the best-fit obtained over the whole set of data (panels a). This analysis is able 
to immediately highlight large and systematic deviations of a subset of empirical data (e.g., pertaining to a particular 
region) with respect to the expected behaviour represented by the best-fit f(T). Such deviations can further be 
quantified as follows. For each data point, say (r aji!( ,LM Qii!/ ), we calculate p ai ,y = LM ai; j / //(r Qiil( ), in such a way 
that a value of p a ,.y close to (far from) one means that the best-fit provides a good (poor) estimate for the number of 
marriages in the province on at the year y. Further, we calculate the spatial and temporal average of p ai , y , namely, 
we get p a = J2?=i J2y =l Pai, y /(T x Np a ) and this is related to F a = J2y =i F ai ,y/(T x N P J (panels b). In this 

way we can inspect the possible presence of internal heterogeneity. For instance, in the case of Switzerland, for any 
region (a couple of cases apart) p a is, within the error bar, approximately unitary. This suggests that the average 
behaviour provided by the best fit constitutes a good representation for all the regions making up the country. A 
similar outcome emerges for Germany. Here, only two city-states (i.e., Berlin and Hamburg), both corresponding to 
relatively large densities of potential couples, fall significantly far from the average value. On the other hand, for the 
remaining countries (i.e., France, Italy, and Spain), most regions exhibit a value of p ai which, still considering the 
related error, is not unitary. 

For those countries we extend the analysis further and we distinguish between regions where the number of marriages 
is, respectively, underestimated (i.e., p a > 1), overestimated (i.e., p a < 1), and finely estimated (i.e., p a « 1) by the 
best fit. These cases are reported on the chart (panels c) with a colormap mirroring the value of p a : the brighter 
the color the smaller the related p a . It is immediate to see that regions corresponding to analogous outcomes tend 
to clusterize geographically. For instance, in France, the North-Eastern part and the Southern part form two distinct 
blocks. In Spain, the Southern part and the Basque region (including some adjacent regions) display analogous 
outcomes opposed to the West-most part. Finally, in Italy, the discrepancy between the two blocks is most evident 
and places side by side Northern and Southern regions. 
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One can therefore treat these clusters as different entities and derive the best fit for each of them separately and 
independently (panels d). The two best fits corresponding to regions with p a < 1 and with p a > 1, respectively, 
are truly shifted only for Italy. This suggests that the heterogeneities emerging for France and Spain are weaker or, 
possibly, of different nature than those pertaining to Italy. 


B. Analysis of mixed marriages 

We now analyze, in a similar fashion, marriages between native and foreign-born citizens. The theoretical framework 
underlying this kind of phenomenology is analogous to the one used above, but here the two parties are played by 
native and by foreign-born individuals (rather than males and females, despite the constraint of heterogamy clearly 
connects these sets of variables as explained in section Methods); we refer to [T3J [55] for a full treatment. 

We check the theoretical law given by Eqs. [Tj and [2| versus the available experimental data for the above mentioned 
five countries and we report our findings in Figure [8] 

First, we notice that different countries display qualitatively different behaviours: the growth of mixed marriages is 
well described by a square-root law (at least for small values of f2) in Spain and in Switzerland, while in France and in 
Germany it is better described by a linear law; in Italy the behaviour is even more complex as data are rather noisy 
and two distinct trends emerge. Therefore, imitative interactions among native and foreign-born agents still seem 
to play a major role in Spain and in Switzerland, while in France and in Germany the presence of “external fields” 
seem to prevail. In Italy, the two trends can be associated to Northern and Southern regions, hence confirming strong 
discrepancies between the two geographical areas also as for immigrant integration. 

Moreover, in general, the goodness of the fits is lower than the case of autochthonous marriages, although the size 
of the available data sets are the same. This suggests that a certain degree of non-uniformity may affect the data 
considered; for instance, no clusterization of data based on e.g., the provenance or the status of the foreign-born spouse 
is possible due to a lack of information. We also expect that the existence of large, well-established ethnical group 
may influence the degree of integration (in terms of mixed marriages) of individuals belonging to the group itself, 
but a complete quantification of this correlation is currently out of reach. These features may by crucial in future 
outlooks for getting a robust and sound picture of the phenomenology. For instance, here we just notice that most 
immigrants in Switzerland come from European (EU-28/EFTA) countries (in particular, « 40% of the immigrants 
come from the neighboring European countries and only ss 12% come from America and Africa) and the number of 
foreign citizens living in the country is almost one fourth of the permanent resident population [32j . In Germany, the 
fraction of foreign-born individuals is S3 12% and, being the second most popular migration destination in the world 
(after the United States), it attracts a wide-ranging class of immigrants (from refugees to high-professional figures) 
[34]. The flow of immigrants is very large in France as well with « 11% of foreign born individuals, of which ss 32% 
come from Europe and « 43% come from Africa (mostly from French-speaking countries) [35] . Immigration from 
(former) colonies is significant also in Spain, where ~ 25% of the immigrants come from South and Central America, 
ss 41% come from European (EU-28/EFTA) countries and « 18% from Africa [33]. As for Italy, almost one fourth of 
the foreign-born population comes from Romania, another fourth coming from Albania, Morocco and China together 

mi¬ 


ni. METHODS 


In this section we deepen the Statistical Mechanics approach underlying our analysis and leading to Eqs. [T] and [2] 
as well as the methodology followed during data analysis and leading to the empirical evidence described above. 


A. The theoretical protocol 

The theoretical modeling is split in two parts, each covering one of the two models used as reference guide. We 
first discuss the one-body theory (independent model) and then move to the two-body theory (model with social 
interactions). 

In both cases we describe the system in terms of decision makers, whose “decision” is denoted with er M , p = 1, ...,Mj 
and with tv, v = 1, ..., _F), where p and v label, respectively, the generic p th male and the generic v th female within 
the i province, and Mi and F t represents the total male and female populations within the province i (according to 
the case considered the population will be restricted to natives or to natives and immigrants). More precisely, a^ and 
r„ are taken binary and meant to denote the attitude toward marriage: = +1 and r„ = +1 indicate a positive 

bias to contract marriage and, vice versa, = — 1 and Ty = — 1 indicate a negative bias to contract marriage. These 
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FIG. 3: Refined data analysis for local marriages in France. Panel a: log-log scale plot of LM ai , y versus r ai , y , where different 
regions are denoted in different colours and symbols, as explained in the legend. These data are properly binned (green squares) 
and best-fitted (solid line) by y(T) = a\/Y + 6, in agreement with the theoretical result ([ 2 ]). The best-fit coefficients are a = 1.87 
and b = —4.38 • 10 -4 with R 2 = 0.97. The parameter b is introduced to account for the error (calculated in terms of the 
standard deviation) associated to binned data, which is ~ 5%. In general, the various regions seem to be homogeneously 
scattered around the best-fit curve. In the insets we show, as examples, the data pertaining to two particular regions, namely 
Limousin (upper inset) and Provence-Alpes-Cote d’Azur (lower inset). Notice that for both the insets, the best-fit previously 
obtained for the whole data set (solid line) still provides a proper fit. Panel b: For each region we calculate p a , as defined in 
the text and deepened in Sec. m The horizontal line is drawn as a reference for the unitary value. Notice that the largest 
deviation from the unitary value is for Corsica. From this plot we can distinguish regions displaying a relatively large number 
of marriages (i.e., p a > 1) and regions displaying a relatively small number of marriages (i.e., p a < 1). This division is 
highlighted in the colormap presented in panel c. Interestingly, regions exhibiting analogous deviations share a certain degree 
of geographical proximity: regions with p a > 1 correspond to the North-Eastern border of France, while regions with p a < 1 
corresponds to the Center-Western part of France. Panel d: the two clusters of regions highlighted are analysed separately. For 
each we binned the related raw data and get a best fit, still according to the function J/(F) = aVT + b , obtaining a up = 1.90 
and b up = —6.5 • 10~ 5 ( R 2 = 0.97) for the set of regions with p a > 1, and adown = 1-68 and bdown = — 2.2610 -4 (R 2 = 0.98) 
for the set of regions with p a < 1; notice that a up /adown ~ 1.1. Binned data for the former set (triangles) and for the latter 
set (square) are shown in the main panel, together with the related best fits, in a log-log scale plot. These fits are slightly 
better that the one obtained at the country level, suggesting that possible internal heterogeneities may be rather limited. In 
the insets we compare these best fits with raw data for two regions (Aquitaine and Auvergne) with p a > 1 (upper inset) and 
two regions (Alsace and Franche-Comte) with p a < 1 (lower inset). Notice that in both cases data points overlap both curves, 
again suggesting that the division highlighted here is rather mild. 
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FIG. 4: Refined data analysis for local marriages in Germany. Panel a: log-log scale plot of LM ai:y versus where different 

regions are denoted in different colours and symbols, as explained in the legend. These data are properly binned (green squares) 
and best-fitted (solid line) by y(r) = ay/T + b , in agreement with the theoretical result Q. The best-fit coefficients are a = 1.87 
and b = 8.62 • 10 -5 with R 2 = 0.99. Notice that the parameter b accounts for the standard deviation associated to binned 
data. In general, the various regions seem to be homogeneously scattered around the best-fit curve. In the insets we show, as 
examples, the data pertaining to two particular regions, namely Bayern (upper inset) and Sachsen (lower inset). Notice that 
for both the insets the best-fit previously obtained for the whole data set (solid line) still provides a very good fit. Panel b: 
For each region we calculate p a , as defined in the text and deepened in Sec. |III| The horizontal line is drawn as a reference 
for the unitary value. Notice that for most regions the deviation with respect the unitary value is within the error bar. The 
largest deviation from the unitary value is for two city-states, namely Hamburg and Berlin. Given that all the regions (a few 
cases apart) are compatible with the overall best fit no further analysis on territorial homogeneity is performed. The colormap 
presented in panel c highlights the deviation of p a with respect to the unitary value. 


decisions can be modulated by external influences (Me Fadden scenario) or by peer interactions (Brock and Durlauf 
scenario): in both cases, these ingredients are added in terms of a cost function H({a }, {r}) (i.e an Hamiltonian) to be 
minimized; for the former the Hamiltonian is one-body , that is, it contains no interactions between variables a, r, i.e. 

{r}) ~ v T v)i while for the latter the Hamiltonian contains interactions among agents, i.e. 

H ({<t} , {t}) ~ — ^2^ v cr^iy. Whatever the choice, within this approach, minimization of the cost function is coupled 
with the request to simultaneously maximization of the related resulting entropy. This request, often called Maximum 
Entropy Approach in inferential investigations |37( l38| , is equivalent to assuming the less structured model, whose 
predictions (e.g. first and second moments) are in agreement with the experimental data. Statistical Mechanics has a 
deep dual structure within the classical statistical routes as pointed out by Jaynes mm and returns the probabilistic 
weight for a given configuration {a, t} in terms of the Maxwell-Boltzman factor P(cr, t) oc exp[— H({<r}, {r})]. 
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FIG. 5: Refined data analysis for local marriages in Italy. Panel a: log-log scale plot of LM Qi ^ versus F aii!/ , where different 
regions are denoted in different colours and symbols, as explained in the legend. As discussed in the text, these data are 
particularly noisy, yet one can bin the whole set of data (green squares) and get the best-fitted (solid line). This is given by the 
function y(F) = a%/T + b, in agreement with the theoretical result ([ 2 J. The best-fit coefficients are a = 1.96 and b = -1.59-10 -4 
with R 2 = 0.97. Notice that the parameter b accounts for the standard deviation associated to binned data. It is clear, even 
by visual inspection, that most regions systematically deviate from the best-fit curve. In the insets we show, as examples, the 
data pertaining to two particular regions, namely Emilia-Romagna (upper inset) and Campania (lower inset). Notice that the 
related data lay, respectively, below and above the best-fit curve previously obtained for the whole data set (solid line). Panel 
b\ For each region we calculate p a , as defined in the text and deepened in Sec. |III| The horizontal line is drawn as a reference 
for the unitary value. Notice that for most regions the deviation with respect the unitary value is significant. From this plot 
we can distinguish regions displaying a relatively large number of marriages (i.e., p a > 1) and regions displaying a relatively 
small number of marriages (i.e., p a < 1). This division is highlighted in the colormap presented in panel c. Remarkably, 
regions exhibiting analogous deviations share a sharp geographical proximity: regions with p a > 1 corresponds to Southern 
Italy, while regions with p a < 1 corresponds to Northern Italy. Panel d\ the two clusters of regions highlighted are analysed 
separately. For each we binned the related raw data and get a best fit, still according to the function y(F) = aVT + b, obtaining 
a„ p = 2.29 and b up = —4.39 ■ 1CU 4 ( R 2 = 0.98) for the set of regions with p a > 1, and adown = 1-66 and bdown = —1.52 ■ 10 -4 
(R 2 = 0.99) for the set of regions with p a < 1; notice that a up /adown ~ 1.4. Binned data for the former set (triangles) and for 
the latter set (square) are shown in the main panel, together with the related best fits, in a log-log scale plot. These fits are 
significantly better that the one obtained at the country level, suggesting the existence of internal heterogeneities. In the insets 
we compare these best fits with raw data for two regions (Calabria and Campania) with p a > 1 (upper inset) and two regions 
(Emilia-Romagna and Lombardy) with p a < 1 (lower inset). Notice that the two sets of data overlap only the pertaining fitting 
curve, further suggesting that the division highlighted here is not negligible. 
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FIG. 6: Refined data analysis for local marriages in Spain. Panel a: log-log scale plot of LM aii!) versus R ai , y , where different 
regions are denoted in different colours and symbols, as explained in the legend. These data are properly binned (green squares) 
and best-fitted (solid line) by y(r) = a\/T+b, in agreement with the theoretical result |2|. The best-fit coefficients are a = 1.852 
and b = —5.16 ■ 10~ 4 with R 2 = 0.99. Notice that the parameter b accounts for the standard deviation associated to binned 
data. It is clear, even by visual inspection, that a few regions slightly deviate from the best-fit curve. In the insets we show, 
as examples, the data pertaining to two particular regions, namely Castilla La Mancha (upper inset) and Galicia (lower inset). 
Notice that for both the best-fit previously obtained for the whole data set (solid line) still provides a proper fit. Panel b: For 
each region we calculate p a , as defined in the text and deepened in Sec. m The horizontal line is drawn as a reference for the 
unitary value. Notice that the largest deviation from the unitary value is for Melila. From this plot we can distinguish regions 
displaying a relatively large number of marriages (i.e., p a > 1) and regions displaying a relatively small number of marriages 
(i.e., p a < 1). This division is highlighted in the colormap presented in panel c. Interestingly, regions exhibiting analogous 
deviations share a certain degree of geographical proximity: regions with p a > 1 corresponds to the Southern part of Spain 
and to Basque region (with some adjacent regions), while regions with p a < 1 corresponds to the North-Western part of Spain. 
Panel d\ the two clusters of regions highlighted are analysed separately. For each we binned the related raw data and get a 
best fit, still according to the function y(F) = a%/T + b , obtaining a up = 1.95 and b up = —1.79 • 10~ 4 ( R 2 = 0.97) for the set 
of regions with p a > 1, and adown = 1-84 and bdown = —8.37 • 10~ 4 (R 2 = 0.99) for the set of regions with p a < 1; notice 
that a up /adown ~ 1.1. Binned data for the former set (triangles) and for the latter set (square) are shown in the main panel, 
together with the related best fits, in a log-log scale plot. These fits are not significantly better that the one obtained at the 
country level, suggesting that possible internal heterogeneities may be rather limited. In the insets we compare these best fits 
with raw data for two regions (Andalucia and Comunitat Valenciana) with p a > 1 (upper inset) and two regions (Castilla y 
Leon and Cataluna) with p a < 1 (lower inset). Notice that in both cases data points overlap both curves, again suggesting 
that the division highlighted here is rather mild. 
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FIG. 7: Refined data analysis for local marriages in Switzerland. Panel a: log-log scale plot of LM aii!/ versus T aiiy , where 
different regions are denoted in different colours and symbols, as explained in the legend. These data are properly binned (green 
squares) and best-fitted (solid line) by y(P) = ay/T + 6, in agreement with the theoretical result (Jij). The best-fit coefficients 
are a = 1.20 and b = 1.93 • 1CP 7 with R 2 = 0.99. Notice that the parameter b accounts for the standard deviation associated 
to binned data. In general, the various regions seem to be homogeneously scattered around the best-fit curve. In the insets we 
show, as examples, the data pertaining to two particular regions, namely ... (upper inset) and ... (lower inset). COMPLETA 
Notice that for both the best-fit previously obtained for the whole data set (solid line) still provides a very good fit. Panel b: 
For each region we calculate p Q , as defined in the text and deepened in Sec. m The horizontal line is drawn as a reference 
for the unitary value. Notice that for most regions the deviation with respect the unitary value is within the error bar. The 
largest deviation from the unitary value is for Glarus, which also exhibits the smallest mean value of F. Given that, a few cases 
apart, all the regions are compatible with the overall best fit no further analysis on territorial homogeneity are performed. The 
colormap presented in panel c highlights the deviation of p a with respect to the unitary value 


This inferential estimation for mean-field imitative models has been recently extensively treated in SB. where the 
interested reader may further deepen the foundation of this approach. 


1. Independent model (for mixed marriages) 

In this subsection we discuss the one-body theory (referring to [T5] [25] for extensive details). Since the outcomes 
of this theory have been shown to be in agreement only to the case of mixed marriages (see Sec. 0- we develop the 
mathematical scaffold already thinking at mixed marriages (i.e. MM) and in terms of possible mixed couples (i.e. f l). 
In the independent model , reciprocal influences among decision makers are not allowed for and the expected number of 
mixed marriages can be estimated by a simple probabilistic approach, where only the relative sizes matter. Since the 
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FIG. 8: Mixed marriages MM (that is involving a native citizen, either male or female, with an immigrant) versus the fraction 
of potential couples f l. Here the raw data used to built the points are MMj, a , which is the number of marriages between native 
citizens and immigrants in the province i at time y divided by the total number of marriages (defined as before) in the province 
i at time y, and T;^, that is the number of mixed couples (here the number of native citizens times the number of immigrants) 
in the province i at time y divided by the square of the sum of the two group of people in the province i at time y. The error 
circles should be understood with respect to the legend reported in each graph. To optimize the visualization of the trends, the 
plots were realized dividing the interval of data of each state in bins of the following numbers (we write the name of the country 
followed by the amount of bins used): Germany 25, France 21, Italy 35, Switzerland 21, Spain 27. The red lines in the plots 
are the best fit functions. With respect to Fig. 1, here the different countries do not show the same behaviour: France and 
Germany follow a linear behaviour, which is derived from a non-interacting particles approach, while Spain and Switzerland 
maintain a square-root trend. Italy displays a mixed behaviour which highlights even more the difference between the north 
and south of the country: in fact, the north is best fitted by a square root function while the south by a line. 


(expected) universal linear scaling between the amount of males and females within each province is always respected 
(both for natives and for migrants), we can write (directly at the level of the whole nation, see Fig. 10 and Sec. Ill B) 


MM = h oj(1 — u>) = h II, 


(3) 


where h represent the overall realizability of a mixed marriage (characterizing the couple guest/host country) and 
w(l — oj) is the probability of a mixed couple meeting, drawn indipendently from a box of oj (fraction of natives) and 
1 — oj (fraction of foreign-borns) citizens. 

This behaviour can be encoded as well with a one-body cost function depending on an external field only: calling 
a/ = ±1, (/i = 1, ...,Mj), the positive (a = +1) -or negative (a = —1)- attitude of the {i th male agent within the 
province i to contract the marriage, calling t* = ±1, (u = 1, ...,.F)) to account for the will of female decision makers 
and, finally, introducing also hfl as properly field containing the probability of a mixed encounter, the model is coded 
into 


N P /Mi Fi \ 

^ (1) (w,w) = -E^ ( 4 ) 

i = 1 V=1 v=l ) 

thus energy minimization simply suggests that the agents on average try to behave accordingly to the external 
suggestions encoded in the fields h , that is, positive values of h favor marriages and viceversa (and the larger the 
value, the stronger the effect). Solving one-body models is straightforward in Statistical Mechanics and returns a 
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trend for mixed marriages as 


MM oc - , 

N P 't—' 
1=1 



(5) 


namely marriages driven mainly by external influences (over peer interactions) are expected to scale linearly in the 
volume of possible couples. 


2. Model with social interactions (for local marriages) 


In this subsection in turn we discuss the two-body theory (referring to [121122 for extensive details): as outcomes of 
this theory apply to all the local marriages, we develop the mathematical scaffold already thinking at local marriages 
(i.e., LM) and in terms of possible autochthonous couples (i.e. T). 

In order to develop the theory leading to Eq. [2J we need to introduce a (two-body) Hamiltonian H^ 2 \{cr}, {r}), 
describing the interactions among males and females |54| . Each individual /x, v is associated to a dichotomic variable 
or “spin” (referred to as a M = ±1 for males and as r v = ±1 for females) encoding a positive (i.e. +1) or negative (i.e. 
— 1) attitude to marriage. Then, the “cost” of a given configuration ({cr}, {r}) is provided by the global Hamiltonian 


U (2) (M,{r}) 


J 

N 



EEo 




i =1 \fi=l is=l 


( 6 ) 


where N = ^ t (Mi + Ff) is the total population in the whole country and J tuning the interaction strenght. This 
Hamiltonian is simply the sum of terms like —cr^r* over all the possible couples (/x, v) of individuals belonging to 
the same province i. Clearly, the underlying mean field assumption of a fully connected network is only a working 
approximation as real social networks are expected to be small worlds I43H45) . however, for simple enough interaction 
rules (as the imitative one coded in eq. ([6]) in the present work) the general scenario (i.e. criticality, scaling, etc.) 
provided by the mean field approximation suitably approximates the more realistic one obtained embedding the system 
on a small world network [22 136H38] . 

Intuitively, for the minimum energy principle -that tries to keep the numerical value of H({a}, {r}) at its minimum 
[6j- the function ([6]) favors the configurations of citizens with the overall lowest possible frustration, that is, considering 
the couple (/x, v) as an example, it favors the two states where the variables a M and r„ are aligned, thus ay = r„ = +1 
or ay = t v = —1, while misaligned couples ay 7^ t v are unfavored. This captures the trivial observation that, within 
any country, stable couples (ay = r„ = +1) as well as stable not-couples (ay = r v = —1) exist |55] - On the other 
hand, the conflicting situation where one of the two partners wants to get married (say oy = +1) but the other does 
not (hence 7y = —1) is only transitory (hence not stable) as, after a proper timescale, the pretender is expected to 
move toward another target (pathological cases apart, whose presence may act as a small noise over the mean results). 
Once the Hamiltonian ruling the phenomenon is assigned, this allows introducing in the standard way the partition 
function Z related to the present case (and whose explicit expression permits to obtain all the desired information) as 


z = EE exp [ _ff (W’W)i > 

M HI 


(7) 


where the sum is performed over all possible JdE 2 Mi x 2 Fi configurations. By a direct calculation, it is straightforward 
to check that 


Z N = EE exp 

M I 1- } 


J 

N 



EE 

fJL=l V= 1 




E ex P 

M 



( 8 ) 


where Tj = MiFi/N and we highlighted the term nm = (X^* oy)/Mj, that measures the average propensity to 
get married for males within the province i. This quantity works as the “order parameter” of the theory and it 
is expected to be proportional to the experimental estimate of the amount of marriages LM^ within the province i 
(suitably normalized). Note that, in order to study the joint evolution of male and female attitudes to marriage, the 
requirement of heterosexual marriages implicitly allows to study the average propensity of only one of the two parties 
(here the male one, via m t , while clearly the same results are achievable using female magnetization as the order 
parameter). 
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Before proceeding it is worth stressing the social meaning of the last and crucial passage in eq. (|8|)[56j: this 
equivalence is trivially obtained by performing the summation over the r variables (i.e., by integrating over the “female 
degrees of freedom”), that returns a term cosh [J(M i /N)mi] Fi \ the latter is then written using cosh(s) = exp [In cosh (a:)] 
and then Taylor-expanding at the leading term as In cosh(rr) ~ x 2 /2. 

Remarkably, such an equivalence states that the initial model described by eq. § , meant for males and females in 
interaction and encoded by sums of terms oc — is statistically equivalent to a model accounting for imitative 
interactions among males only, thus encoded by terms oc that is — mf. Otherwise stated, the phenomenological 

rule described by the cost function where males and females interact trying to satisfy their relationships, recovers 
the copy model theory, that is, the discrete choice with imitation 24, 09] in socio-economic literature (or Hebbian 
ferromagnetism in statistical mechanics literature Email EH EB). Of course, and in complete analogy, we could reach 
imitation among females only, by summing over the cr variables first. Discrete-choices with imitation, where agents 
(here males) interact pair-wise in an imitative fashion, is well known in statistical mechanics (as the Curie-Weiss model 
@211511) as well as in quantitative sociology (as the Brock-Daurlauf theory [28]): for this model, where interactions 
between citizens from different provinces were neglected, the expected (i.e. averaged) behavior of the order parameter 
TOj depends only on the parameter in the form 

m = tanh [ J 2 V • m] , (9) 

By Taylor-expanding the above equation for small to we get to oc -\/r and identifying the theoretical order parameter 
to. with the experimental one LM, we obtain the expected leading trend 


LM ~ Vf, (10) 

This equation predicts, within each country (namely under the assumption of homogeneity among the <j as well as the 
r variables), a square-root growth for the expected number of marriages LM versus the density of potential couples 
T and can be compared directly with experimental data. 


B. The experimental protocol 

Even the experimental protocol is split into two main parts. In the first one we revise how we elaborated the time 
series to recover observables to be framed in the Statistical Mechanics model (i.e. the results presented in Fig. |T|) , that 
is also the route paved in US 123; then, in the second section we discuss novel inferential techniques we developed to 
reveal potential regional clusters (used to produce Figures |3][7|. 
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1. General scenario 




FIG. 9: Schematic representation of the procedure performed in data analysis. For each country (here we show only local 
marriages in Spain as test case) the time series for fraction of marriages and fraction of couples are considered, as shown in the 
left panel. Each dot corresponds to a given province and a given year. We check that, for each province i, Ti ty (or fl according 
to the case considered) is monotonic versus time, so to allow the inversion that returns time versus T (not shown). Then, 
time dependence is by-passed by plotting -as raw data- the fraction of marriages versus the fraction of couples as shown in the 
central panel. Lastly we bin the latter to obtain the coarse grained evolution of the phenomenon, whose data points (yellow 
circles) have been best fitted against the theory (solid line). 


We collected data for the number of marriages (local and mixed) and for the female and male populations (native and 
foreign-born) in France (source: INSEE), Germany (source: DESTATIS), Italy (source: ISTAT), Spain (source: INE), 
and Switzerland (source: FSO) over the time windows [2006—2010], [2007—2012], [2005 — 2010], [1996,1998—2004][57]. 
[2008 — 2010], respectively. Time series are collected at the provincial level. This degree of resolution is determined by 
the empirical observation that marriages typically involve people living in the same province (i.e., the social interactions 
display a characteristic geographic scale m) ; further, this allows, in turn, treating each province independently from 
the others and thus performing statistical analysis over the set of provinces meant as different realisations of the same 
system. 

Therefore, for any given country, we have Np provinces and for each, labeled as i , we collected the data series 
{LMj iy }, {MMi iS }, {r, and where y is a time index, running over the time window considered (we recall 

that data are sampled yearly). More precisely, 

• LMj y represents the ratio between the number of local marriages (i.e., between two autochthonous people) registered 
in the province i at time y and the number of all marriages (i.e., including local marriages, mixed marriages and 
marriages between two foreign-born people) registered at time y\ 

• MM, ; , y represents the ratio between the number of mixed marriages (i.e., between one native and one foreign-born) 
registered in the province i at time y and the number of all marriages (i.e., including local marriages, mixed marriages 
and marriages between two foreign-born people) registered in the province i at time y; 

• Ti tV represents the number of possible couples in the province i at time y and is defined as the the number of males 
times the number of females (residing in the province i at time y) divided by the square of the overall population 
(residing in the country at time y)\ 

• represents the number of possible mixed couples in the province i at time y and is defined as the the number 
of natives times the number of foreign-born (residing in the province i at time y) divided by the square of the overall 
population (residing in the province i at time y). 

We stress that data on marriages account only for heterosexual marriages and that Fj i3/ accordingly refers to 
heterosexual couples. On the other hand, by definition, fincludes all possible pairs. This choice is meant to 
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simplify the treatment, since in the Statistical Mechanics analysis one has to deal with a bipartite model rather than 
a system made of four parties. This simplifications is justified by the following empirical observation (see Fig. 101: 
the immigrant community is approximately made of half males and half females and this ratio is independent of the 
extent of the community. By the way, the same is evidenced also for the native community as shown in Fig. |10| too. 
As a result, denoting with Ff the number of female foreign-born agents, with Mf the number of male foreign- 
born agents, with F n the number of female native agents, and with M n the number of male foreign-born, one has 


Ff ss Mf ss (Ff + Mf)/ 2 and F n 


M n 


of possible heterosexual mixed couples can therefore be written as 


(F n + M n )/ 2. Also, f 1 ~ (Ff + Mf)/(Ff + Mf + F n + M n ). The fraction 


M n ■ F f + Mf • F n 
(M n + M f ) ■ (F n + F f ) 


2w(l -to) = 2Cl. 


( 11 ) 


Thus, it is reasonably to take f1 as an estimate for the number of possible mixed couples. 

The time series described above are then manipulated according to a lengthy but simple protocol, largely discussed 
in dam and summarized in Figure [9} leading to a set of binned data points for local and mixed marriages versus 
r and Cl, respectively. The dependence on the province i is lost during manipulation and under the hypothesis that 
provinces making up the same country are sufficiently homogeneous (this point is further deepened below). The 
dependence on y is lost as well or, otherwise stated, it accounts for the variation of F and Cl. 

In this way we extract the average evolution of LM versus T and of MM versus Cl, and these are then best-fitted 
against the laws stemming from the theoretical analyses, namely the square-root law Eq. [l0]and the linear law Eq. [5] 
Results for local and mixed marriages are shown in Figs. |T] and [HJ respectively, and are discussed in Sec. [ITJ 



FIG. 10: Panel a: log-log scale plot for the number of native females versus the number of native males. Binned data for 
different countries are represented in different symbols and colours as explained in the legend; the same legend holds for panels 
b, c, and d as well. The solid line represents the best fit given by y = ax + b, where a « 1.03. Panel b: log-log scale plot for 
number of foreign-born females versus the number of foreign-born males; each point represent the average value over the whole 
country and for a different year. The solid line represents the best fit given by y = ax + b, where a « 0.97. Panel c: scatter 
plot for the ratio between the number of native males and the number of native females. Each data point represents the value 
obtained for a different year and province. The solid line represents the unitary value as a reference. Panel d: scatter plot for 
the ratio between the number of foreign-born males and the number of foreign-born females. Each data point represents the 
value obtained for a different year and province. The solid line represents the unitary value as a reference. 


2. Cluster detection 

Once the overall behavior at the country level has been inferred, there is still to face the homogeneity issue, that is, 
the existence of regions displaying systematic differences with respect to the average results. The analysis is meant 
to highlight the potential presence of inner differences, possible heritages of ancient different cultural traditions, thus 
it is performed only on local marriages. 
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Figure [l] is obtained averaging over all the provinces within the country, as described in the previous subsection. 
The binned data points are then best-fitted versus the function f(T) = a\/T + b in agreement with the theoretical 
model ( [To] ) . The next step is to refine the level of resolution and to distinguish different regions; the behavior of 
each region is then compared to the average one (i.e. the one obtained at the country level). Therefore, the above 
mentioned time series have been reshuffled as {r aij . y } and {LM Qijy }, where on denotes the i -th province in the a-th 
region. Of course, each country displays a different number Nr of regions (Nr = 22 for France, Nr = 16 for Germany, 
Nr = 20 for Italy, Nr = 18 for Spain, Nr = 26 for Switzerland) and each region a includes a different number Np a 
of provinces. 

In Figures |3|7] the set of data {LM aiiy } is plotted versus {r Qiiy } and each region is depicted in a different color. In 
this way one can immediately detect anomalous behaviours with respect to the average behaviour represented by the 
function y = f(F). For instance, with this method, we recover that Corsica (see [3]) and Melila (see [6] do not match 
the reference curve, as somehow expected given the peculiarity of these regions. In order to quantify the goodness of 
the estimate provided by the best fit, we introduce the observable p ai , y and its average p a defined, respectively, as 


/( r„,a) 

Pai ’ v - LM ai , y ’ 

(12) 

1 Np 

pa = N^^2 Pa '- 

(13) 


i= 1 


In a country where people behave homogeneously throughout its territory one expects that p ai fluctuates randomly 
around 1, in such a way that its average p a is close to one. This is the case for regions in Germany and in Switzerland, 
see Fig. |TT] (upper panels). On the other hand, in a country displaying internal heterogeneities, we expect that for 
some regions (possibly forming a connected cluster) p ai is systematically above (or below) 1, in such a way that its 
average p a is far from one. This is the case for regions in France, Italy and Spain, see Fig. [TT| (lower panels). 

Notice that p a > 1 means that the global best-fit overestimates the number of marriages in the region a, while 
p a < 1 means that the global best-fit underestimates the number of marriages in the region a. Therefore, the simplest 
distinction one can introduce is between regions where the number of marriages is relatively low (i.e., p a > 1) and 
regions where the number of marriages is relatively high (i.e., p a < 1)- This distinction is performed for France, Italy 
and Spain, as outlined in Figs. !Si Interestingly, regions displaying the same trend constitute clusters. 

Each cluster is then treated separately, and for each a new best-fit is found. The same analysis as before are 
repeated and again compared with data. A true shift is actually evidenced only for Italy. 
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FIG. 11: The dispersion of the observable p ai is shown for the region Rheinland-Pfalz in Germany (upper panel) and for the 
regions Campania and Emilia-Romagna in Italy (lower panel). More precisely, in panel a we show the set of data LM ai versus 
r ai , where a; here runs over all the x provinces in Rheinland-Pfalz; the solid black line represents the best-fit obtained over 
the whole set of data pertaining to Germany. Notice that the best-fit is the same already presented in Fig. [4] In panel b we 
show p ai obtained by dividing each data point LM ai in panel a by the related expected value /(r ai ). Notice that the values 
of p ai are scattered randomly around 1. The same passages are applied, mutatis mutandis, in panels c and d; the legend and 
the best fit are the same as in Fig. [5] Notice that now data pertaining to Campania and to Emilia-Romagna are systematically 
above and below, respectively, the best-fit line. Accordingly, the related values of p ai are always larger and smaller than 1, 
respectively. 


IV. CONCLUSIONS 

In this paper we analyzed, within a statistical mechanical framework, the way the number of marriages evolves in 
five different European countries; we consider local marriages (i.e. involving two autochthonous spouses) and mixed 
marriages (i.e. involving a native spouse and a foreign-born spouse). The inspected countries are France, Germany, 
Italy, Spain and Switzerland. Instead of classical historical series analysis (i.e. the study of temporal dynamics of the 
social quantifier considered), for each of these countries we studied the evolution in the density of marriages (local and 
mixed) versus the percentage of potential couples (both natives or mixed, respectively) and we find that the results 
can all be framed within the two extrema scenarios: 

- Me Fadden Discrete Choice: each agent decides regarding his/her will to get married essentially without relying 
on peer choices. Within this framework, elementary statistical mechanical calculations predict a linear correlation 
between the density of marriages and the density of potential couples present in the territory. 

- Brock and Durlauf Imitational Choice: each agent decides according to imitational mechanisms based on peer choices. 
Within this framework, statistical mechanics of imitative models (i.e. ferromagnetic models) predicts a square root 
relation between the density of marriages and the density of potential couples present in the territory. 

After a proper data manipulation and model calibration we find that, as far as local marriages are concerned, in all 
the analyzed nations the Brock and Durlauf scenario (discrete choice wit social interactions) prevails: at the national 
level, the relation between local marriages and potential couples always follows a square root, apart for the case of 
Italy. In the latter, two well-separated square root growths appear and mirror the marriage evolution of two detached 
communities that, remarkably, do coincide with the Northern and Southern regional clusters into which the peninsula 
were split before its unification in 1861. This finding may suggest that Italy still experiences persistence of lasting 
heritages of different cultures that have not mixed yet. 

Inspired by the Italian case , to deepen the possible existence of regional clusters (i.e. ensembles of geographically 
adjacent regions sharing close behaviors) within each analyzed country, we then moved to study data at the regional 
level, collecting the trend in their marriage evolution for each region and then comparing regional behavior with 
the averaged-national one. In this way it was possible to distinguish regions where the evolution of marriages is, 
respectively, consistent with, overestimated, or underestimated by the average (i.e., at the country level) behavior. 
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Interestingly, we also find that regions with analogous outcomes typically form connected clusters. While this effect 
is mild for France and Spain, it becomes manifest for Italy, highlighting the net presence of behavioral differences 
between its Northern and Southern regions. Such heterogeneity is absent in Germany and Switzerland, where the 
behavior of all the regions, within the experimental error, fall into the main class paved by the national reference. 

Moving to mixed marriages, we found that the Brock and Durlauf scenario becomes much less pronounced, possibly 
surviving only in Spain (for low values of possible mixed couples) and in Switzerland (which is somehow a case by 
itself given that its immigration policies do not have to overlap with EU prescriptions). 

For all the other countries we found that it is the Me Fadden theory to reproduce remarkably well the behavior (i.e. in 
France and Germany the relation between marriages and potential -mixed- couples follows a sharp straight line, while 
in Italy noise in the data forbids to make sharp statements). This possibly suggests that imitational mechanisms, 
widespread among decision makers within modern societies, may require a higher level of integration when migrants 
are concerned as their establishment in these mixed cases seems not to have taken place yet. 
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